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Abstract 


This  project  focuses  on  how  humans  master  new  categories  by  learning  from 
examples  with  extension  to  dynamic  environments.  Decision  making  tends  to 
take  place  in  dynamic  environments  in  which  successive  decisions  are  contingent 
on  one  another,  and  in  which  the  rewards  associated  with  actions  can  be  delayed, 
yet  most  tasks  that  have  been  studied  in  the  laboratory  are  broken  up  into  brief, 
independent  trials  (e.g.,  classification  of  a  stimulus)  in  which  responses  are 
determined  only  by  the  immediate  context  and  have  no  bearing  on  future  states 
of  the  task  environment.  Thus,  this  project  narrows  the  gap  between  the  range  of 
mental  processes  typically  addressed  by  cognitive  scientists  and  the  mental 
processes  that  underlie  performance  in  Air  Force  relevant  activities.  We  find  that 
people's  performance  profiles  are  generally  consistent  with  modern 
reinforcement  learning  models.  For  example,  including  perceptual  information 
that  disambiguates  a  person's  current  state  within  a  task  improves  performance. 
Additionally,  consistent  with  model-based  predictions,  people  appear  to  hill 
climb  on  reward  gradient,  as  opposed  to  globally  optimize  performance,  and 
show  other  suboptimal  behavior,  such  as  poorer  performance  under  certain 
circumstance  when  given  more  information  about  response  options. 


Project  Overview 

In  this  project,  the  PI  and  his  collaborators  have  made  progress  in  understanding 
human  category  learning  and  have  extended  this  work  to  dynamic  decision 
making  environments.  Below,  findings  from  this  project  are  briefly  reviewed. 
Following  this  review,  doctoral  students  who  have  graduated  during  this  project 
are  listed,  as  our  project  publications. 

Todd  Gureckis  and  the  PI  have  published  a  number  of  articles  that  develop  the 
sequential  learning  aspects  of  the  project.  In  the  Cognitive  Science  article,  we 
conduct  a  formal  model  comparison  of  simple  recurrent  and  buffer  networks  and 
find  that  the  simpler  buffer  networks  do  a  better  job  of  characterizing  human 
learning  and  sequential  performance.  Surprisingly,  there  has  been  little  previous 
fine  grain  evaluation  of  sequential  learning  models.  We  derived  predictions  from 
our  buffer  network  and  found  a  strong  linear  (through  time)  constraint  on 
human  sequential  learning  that  is  not  present  in  human  category  learning. 

In  two  papers,  one  published  in  the  Journal  of  Mathematical  Psychology  and  the 
other  in  Cognition,  we  explore  human  learning  and  decision  making  in  a 
dynamic  environment  in  which  short-  and  long-term  rewards  are  put  in  conflict. 
We  find  that  people  can  learn  to  make  long-term  responses  when  state  cues  are 
present  that  de-alias  underlying  system  states  and  allow  for  generalization  of 
rewards  to  yet  unexplored  states.  In  noisy  environments,  we  find  that  noise  on 
state  cues  is  much  more  detrimental  to  human  and  model  performance  than  is 
equivalent  noise  on  rewards,  even  though  rewards  define  the  learning  problem. 
In  fact,  moderate  levels  of  noise  on  rewards  can  be  beneficial  in  that  it 
encourages  exploration  in  a  task  in  which  humans  and  models  under  explore. 


We  use  simple  reinforcement  learning  models  to  derive  our  study  designs  and 
characterize  our  results. 


Three  other  papers  have  been  published  exploring  human  learning  and  decision 
making  when  short-  and  long-term  rewards  are  in  conflict.  In  a  paper  published 
in  Psychonomic  Bulletin  &  Review,  we  examined  whether  state  cues  make 
people  more  rational  or  just  more  sensitive  to  the  gradient  of  reward  as  our 
models  predict.  By  comparing  performance  when  reward  curves  are  close  or  far 
apart,  we  found  that  state  cues  led  people  to  be  more  sensitive  to  reward 
gradient,  not  more  rational.  People  hill  climbed  toward  states  with  increasing 
rewards  even  when  doing  so  was  not  optimal.  In  a  Judgment  and  Decision 
Making  paper,  we  found  (as  reinforcement  learning  models  predict)  that  giving 
additional  information  about  forgone  rewards  (i.e.,  information  about  the  choice 
option  not  selected)  lowers  performance  (i.e.,  people  meliorate  and  choose  the 
short-term  option).  Finally,  in  a  Journal  of  Experimental  Psychology:  Learning, 
Memory,  &  Cognition  paper,  we  manipulate  people's  motivational  focus  and 
find  a  systematic  effect  on  people's  exploration  strategies.  In  particular,  people 
are  more  streaky  (i.e.,  explore  systematically  by  making  a  number  of  identical 
responses  consecutively)  when  in  a  regulatory  fit  motivational  state. 

In  two  papers  (a  Memory  &  Cognition  and  Psychological  Science  paper),  we  find 
that  people's  estimation  of  category  mean  and  variance  is  consistent  with  error- 
driven  learning  models  that  make  sequential  updates.  In  the  Psychological 
Science  paper,  we  find  that  people's  conceptions  of  categories  distort  away  from 
contrasting  categories.  The  mechanisms  we  explore  in  these  papers  can  explain 
high-level  idealization  effects. 

Finally,  in  a  second  Memory  &  Cognition  paper,  we  find  evidence  for  two 
pathways  for  stimulus  encoding.  We  borrow  theoretical  ideas  from  the  object 
recognition  literature.  We  find  that  one  pathway  that  experts  use  is  holistic  and 
whereas  the  second  pathway  is  more  part-based  or  discrete.  This  latter  pathway 
requires  effortful  processing  to  decompose  and  analyze  stimulus  parts. 
Although  many  researchers  have  explored  the  possibility  that  there  are  multiple 
learning  systems  in  the  brain,  fewer  have  explored  the  possibility  that  visual 
stimuli  can  be  encoded  in  multiple  formats. 

A  final  journal  article  most  closely  related  to  the  proposed  work  is  the  Maddox  et 
al.  contribution.  In  that  paper,  rule-based  and  information-integration  category 
learning  were  compared  under  minimal  and  full  feedback  conditions.  Rule-based 
category  structures  are  those  for  which  the  optimal  rule  is  verbalizable. 
Information-integration  category  structures  are  those  for  which  the  optimal  rule 
is  not  verbalizable.  With  minimal  feedback  subjects  are  told  whether  their 
response  was  correct  or  incorrect,  but  are  not  informed  of  the  correct  category 
assignment.  With  full  feedback  subjects  are  informed  of  the  correctness  of  their 
response  and  are  also  informed  of  the  correct  category  assignment.  An 
examination  of  the  distinct  neural  circuits  that  subserve  rule-based  and 
information-integration  category  learning  leads  to  the  counterintuitive  prediction 
that  full  feedback  should  facilitate  rule-based  learning  but  should  also  hinder 
information  integration  learning.  These  predictions  held.  The  results  were 


modeled  by  a  reinforcement  learning  system  and  a  Bayesian  hypothesis  testing 
system  whose  outputs  were  combined  by  a  gating  mechanism.  The 
reinforcement  learning  systems  processing  of  only  feedback  valence  was 
explained  by  making  recourse  to  additional  dynamic  tasks  it  subserves,  like 
motor  control  and  the  kinds  of  problems  considered  in  the  aforementioned 
Gureckis  and  Love  papers. 
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